Diabetes is a group of metabolic conditions that are characterized by high blood sugar1. The two main forms of diabetes are Type 1– the inability of the body to make enough insulin (type 1) – or type 2 diabetes – the inability of glucose consuming organs to properly respond to insulin1.
Individuals with diabetes are at high risk for developing high blood pressure, high cholesterol, and high triglycerides, which in turn can eventually lead to heart disease, stroke, blindness, kidney failure. Diabetes is also associated with increased risk of dementia and Alzheimer’s disease2 as well as cancers such as liver, pancreatic, and breast cancer3. Thus, due to the diabetes-related high morbidity and mortality rates, the average medical costs for patients with diabetes are 2.3 times higher than the rest of the population4. Diabetes prevalance in different counties of the United States is visualized in Figure 1A.
There are various US-wide educational programs aimed at diabetes prevention. However, targeting the right population is an important roadblock5. There are large differences in diabetes incidence based on age, gender and ethnicity/race6,7. Importantly, individuals of different age, gender and racial/ethnic backgrounds are disproportionately affected by the neighborhood social vulnerability. Social vulnerability index (SVI) is used by the Center for Disease Control and Prevention (CDC) to describe the vulnerability of a specific county within the US (Figure 1B) based on factors such as the socioeconomic status, household composition and disability, minority status and language as well as housing and transportation access8. SVI has been linked to other forms of metabolic outcomes like cardiovascular mortality9. SVI is used by state and local health departments and non-profits to guide community- based health promotion initiatives1. However, the level to which SVI can affect diabetes is not known.
Figure 1: Diabetes Prevalance and Average SVI in Different Counties of United States
A. Diabetes Prevalence per County in Different States
B. SVI Averages per County in Different States
The aim of this study is to model the relationship between SVI and diabetes prevalence in different states in the USA.
A. Data Set
The main data was obtained from the CDC web site: https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html by setting “Year” variable to 2018 (the latest available data) and “Geography” to United States. This is a retrospective dataset, the study population are the individual counties within the states composing the USA for which percentage of diagnosed diabetes cases and SVI are available. This dataset is attached as an appendix and is labeled as “cdc_Diabet_socFctrs2018.csv”. The dataset contained 3142 observations. Two additional datasets attached with this document and is labeled as “ggmap_states.csv” and “ggmap_counties.csv”, were obtained using the ggmap package in R programming language10. The first dataset contains longitude and latitude values for individual states and the second one contains same information for individual counties. The datasets were used to create a map and visually represent that diagnosed diabetes percentage and a map of SVI of individual counties within states. To add widely-accepted abbreviations of individual states for improved plot aesthetics, a dataset was obtained from the United States Post Office web site https://about.usps.com/who-we-are/postal-history/state-abbreviations.htm with state names and the corresponding abbreviations. The dataset is also attached and labeled as “stat_name_abbrevs.csv”.
B. Study Population
The data for individual counties was filtered so that no missing values were in the SVI and percent diagnosed diabetes variables. Then the data were filtered further to ensure that at least 25 observations were available for each state to be able to use the Central Limit Theorem. The final dataset had 2965 observations and is attached as “cdc_diagDiab_final.csv”. The variable characteristics are depicted in Table 1. The data was also analyzed using histograms and boxplots. Boxplots are shown in Figure 2.
Figure 2: Average Percent Diagnosed Diabetes by State
C. Statistical Methods
To analyze the relationship between diabetes prevalence and SVI regression was performed using non-parametric Kernel Regression method with the Nadaraya-Watson equation11,12 and bootstrap 95% CI were calculated. The code for Kernel regression was written based on the formula and tested on small datasets and results were compared to the already existing ksmooth function from stats package in R (data not shown). The purpose of the code is to ensure that bandwidth can be adjusted and degrees of freedom can be obtained from the regression model.The analysis suggested an overall linear model for the relationship between diabetes prevalence and SVI. Thus, 2-way ANOVA was performed to further examine this relationship with interaction based on county demographic status (urban vs rural). The SVI variable was categorized using the estimated mean and standard deviation with values below the mean-sd indicating low SVI, values mean + sd indicating high SV, and values within one sd from the mean indicating average SVI. These were used as a factor to help explain the variances in diabetes per state.
Table 1: Summary of the Variables
| Minimum | Maximum | Mean | Median | Standard Deviation | Number of States | Number of Counties | |
|---|---|---|---|---|---|---|---|
| Diagnosed Diabetes (%) | 4.5 | 17.9 | 8.742 | 8.400 | 1.799 | 37 | 1776 |
| SVI | 0.0 | 1.0 | 0.507 | 0.509 | 0.288 | 37 | 1776 |
Based on summary Table 1 as well as Figure 1 and Figure 2, it is apparent that there are differences in percent diagnosed diabetes based on each (county and) state. To understand whether these differences in diabetes prevalence can be influenced by SVI which differs based on a particular state (and county), the relationship between diabetes prevalence and SVI was analyzed (Supp. Fig. 1, Fig. 3).
Figure 3: Relationship between SVI and Percent Diagnosed Diabetes
Based on the Kernel Regression, the relationship between SVI and Percent Diagnosed Diabetes seems to fit the assumption about linear relationship between the two variables. The degrees of freedom was estimated to be 6.56. This result helped support the use of ANOVA technique, which allowed simplification of the model to examine whether there are meaningful differences between groups.
Table 2: Association between SVI and Diabetes Prevalance Influenced by Rural/Urban Status
A. 2-Way ANOVA Results
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| SVI_Category | 2 | 1648 | 824 | 321.1 | 6.047e-127 |
| Urban_vs_Rural | 1 | 311.9 | 311.9 | 121.6 | 9.863e-28 |
| SVI_Category:Urban_vs_Rural | 2 | 41.81 | 20.9 | 8.147 | 0.0002961 |
| Residuals | 2959 | 7592 | 2.566 | NA | NA |
B. Contrats (FDR-corrected)
| term | contrast | null.value | estimate | std.error | df | statistic | adj.p.value |
|---|---|---|---|---|---|---|---|
| SVI_Category*Urban_vs_Rural | low Rural - average Rural | 0 | -0.6937 | 0.1126 | 2959 | -6.158 | 1.141e-09 |
| SVI_Category*Urban_vs_Rural | low Rural - high Rural | 0 | -2.183 | 0.132 | 2959 | -16.54 | 2.775e-58 |
| SVI_Category*Urban_vs_Rural | low Rural - low Urban | 0 | -0.2708 | 0.1293 | 2959 | -2.094 | 0.0363 |
| SVI_Category*Urban_vs_Rural | low Rural - average Urban | 0 | -1.545 | 0.1075 | 2959 | -14.38 | 5.912e-45 |
| SVI_Category*Urban_vs_Rural | low Rural - high Urban | 0 | -2.696 | 0.1308 | 2959 | -20.62 | 3.432e-87 |
| SVI_Category*Urban_vs_Rural | average Rural - high Rural | 0 | -1.49 | 0.1097 | 2959 | -13.58 | 1.943e-40 |
| SVI_Category*Urban_vs_Rural | average Rural - low Urban | 0 | 0.4229 | 0.1065 | 2959 | 3.972 | 7.831e-05 |
| SVI_Category*Urban_vs_Rural | average Rural - average Urban | 0 | -0.8512 | 0.07854 | 2959 | -10.84 | 1.078e-26 |
| SVI_Category*Urban_vs_Rural | average Rural - high Urban | 0 | -2.003 | 0.1083 | 2959 | -18.5 | 1.126e-71 |
| SVI_Category*Urban_vs_Rural | high Rural - low Urban | 0 | 1.912 | 0.1268 | 2959 | 15.09 | 3.903e-49 |
| SVI_Category*Urban_vs_Rural | high Rural - average Urban | 0 | 0.6384 | 0.1044 | 2959 | 6.115 | 1.364e-09 |
| SVI_Category*Urban_vs_Rural | high Rural - high Urban | 0 | -0.5132 | 0.1283 | 2959 | -4.001 | 7.465e-05 |
| SVI_Category*Urban_vs_Rural | low Urban - average Urban | 0 | -1.274 | 0.101 | 2959 | -12.62 | 2.543e-35 |
| SVI_Category*Urban_vs_Rural | low Urban - high Urban | 0 | -2.426 | 0.1255 | 2959 | -19.33 | 1.331e-77 |
| SVI_Category*Urban_vs_Rural | average Urban - high Urban | 0 | -1.152 | 0.1029 | 2959 | -11.19 | 2.678e-28 |
The 2-Way ANOVA confirmed that there is a significant interaction between SVI and diabetes prevalence, which is influenced by the county urban or rural status (Table 2).
Based on the analysis, SVI affected percent diagnosed diabetes values in a state-dependent manner. Furthermore, the number of rural counties within each state further affected this relationship. The Nadaraya-Watson equation further confirmed that there is a relationship between percent diagnosed diabetes and SVI was dependent on multiple factors.
The study revealed that there is a complex relationship between percent diagnosed diabetes. This relationship varies between states and counties, especially based on whether counties are rural or urban. This is expected, since in rural counties there is higher need for physical activity, while urbanization has been linked with increased diabetes outcomes in the past3. Furthermore, the relationship between SVI and diabetes has an overall positive slope, independently from the factors influencing this relationship. This is also expected as higher social vulnerability in many forms has effects on diabetes prevalence. For instance, one component of SVI is low socio-economic status, which has been linked with diabetes incidence in the past3. One impartant limitation of this analysis is that several states were not included (Arizona, Connecticut, Delaware, District of Columbia, Hawaii, Maine, Maryland, Massachusetts, Nevada, New Hampshire, New Jersey, Rhode Island, Vermont, Wyoming) due to small amount of data available from these. In the future it would be important to adjust the analyses if more data becomes available. Another important limitation of the study is that the source of data did not differentiation between Type 1 and Type 2 diabetes. It is known that while Type 1 diabetes has mainly a genetic cause, the causes of Type 2 diabetes are usually a combination of a genetic predisposition and environmental factor. Following this, in the future it would be great to differentiate between Type 1 and Type 2 diabetes. Additionally, it would be important to start collecting and de-identifying genetic information from patients with and without diabetes, to better understand that interaction between the environmental and genetic factors for humans.
Supplementary Figure 1: The Diabetes Prevalance and SVI Relationship in Urban vs. Rural Areas